Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38585907

RESUMO

The biological process of RNA translation is fundamental to cellular life and has wide-ranging implications for human disease. Yet, accurately delineating the variation in RNA translation represents a significant challenge. Here, we develop RiboTIE, a transformer model-based approach to map global RNA translation. We find that RiboTIE offers unparalleled precision and sensitivity for ribosome profiling data. Application of RiboTIE to normal brain and medulloblastoma cancer samples enables high-resolution insights into disease regulation of RNA translation.

2.
NAR Genom Bioinform ; 5(1): lqad021, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36879896

RESUMO

The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome.

5.
Bioinformatics ; 38(3): 597-603, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34718418

RESUMO

MOTIVATION: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. RESULTS: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. AVAILABILITY AND IMPLEMENTATION: CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Epigenoma , Sequência de Bases , Análise de Sequência de DNA/métodos , Redes Neurais de Computação
6.
Front Genet ; 12: 728900, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34759956

RESUMO

Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.

7.
Front Cell Dev Biol ; 9: 720570, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34604223

RESUMO

Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.

8.
Mol Cell Proteomics ; 20: 100076, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33823297

RESUMO

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Assuntos
Proteogenômica/métodos , Bases de Dados de Proteínas , Células HCT116 , Humanos , Aprendizado de Máquina , RNA-Seq , Ribossomos
9.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834200

RESUMO

The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.


Assuntos
Aprendizado Profundo , Escherichia coli/genética , Genoma Bacteriano , Genômica/métodos , Sítio de Iniciação de Transcrição , Sequência de Bases , Sítios de Ligação , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Escherichia coli/metabolismo , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
10.
Nat Cancer ; 2(6): 611-628, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-35121941

RESUMO

Post-transcriptional modifications of RNA constitute an emerging regulatory layer of gene expression. The demethylase fat mass- and obesity-associated protein (FTO), an eraser of N6-methyladenosine (m6A), has been shown to play a role in cancer, but its contribution to tumor progression and the underlying mechanisms remain unclear. Here, we report widespread FTO downregulation in epithelial cancers associated with increased invasion, metastasis and worse clinical outcome. Both in vitro and in vivo, FTO silencing promotes cancer growth, cell motility and invasion. In human-derived tumor xenografts (PDXs), FTO pharmacological inhibition favors tumorigenesis. Mechanistically, we demonstrate that FTO depletion elicits an epithelial-to-mesenchymal transition (EMT) program through increased m6A and altered 3'-end processing of key mRNAs along the Wnt signaling cascade. Accordingly, FTO knockdown acts via EMT to sensitize mouse xenografts to Wnt inhibition. We thus identify FTO as a key regulator, across epithelial cancers, of Wnt-triggered EMT and tumor progression and reveal a therapeutically exploitable vulnerability of FTO-low tumors.


Assuntos
Neoplasias Epiteliais e Glandulares , RNA , Dioxigenase FTO Dependente de alfa-Cetoglutarato/genética , Animais , Regulação para Baixo/genética , Transição Epitelial-Mesenquimal/genética , Humanos , Camundongos
11.
Nat Commun ; 11(1): 4956, 2020 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-33009383

RESUMO

Tet-enzyme-mediated 5-hydroxymethylation of cytosines in DNA plays a crucial role in mouse embryonic stem cells (ESCs). In RNA also, 5-hydroxymethylcytosine (5hmC) has recently been evidenced, but its physiological roles are still largely unknown. Here we show the contribution and function of this mark in mouse ESCs and differentiating embryoid bodies. Transcriptome-wide mapping in ESCs reveals hundreds of messenger RNAs marked by 5hmC at sites characterized by a defined unique consensus sequence and particular features. During differentiation a large number of transcripts, including many encoding key pluripotency-related factors (such as Eed and Jarid2), show decreased cytosine hydroxymethylation. Using Tet-knockout ESCs, we find Tet enzymes to be partly responsible for deposition of 5hmC in mRNA. A transcriptome-wide search further reveals mRNA targets to which Tet1 and Tet2 bind, at sites showing a topology similar to that of 5hmC sites. Tet-mediated RNA hydroxymethylation is found to reduce the stability of crucial pluripotency-promoting transcripts. We propose that RNA cytosine 5-hydroxymethylation by Tets is a mark of transcriptome flexibility, inextricably linked to the balance between pluripotency and lineage commitment.


Assuntos
5-Metilcitosina/análogos & derivados , Diferenciação Celular , Proteínas de Ligação a DNA/metabolismo , Células-Tronco Embrionárias Murinas/citologia , Células-Tronco Embrionárias Murinas/metabolismo , Proteínas Proto-Oncogênicas/metabolismo , RNA/metabolismo , 5-Metilcitosina/metabolismo , Animais , Especificidade de Anticorpos/imunologia , Sequência de Bases , Dioxigenases , Corpos Embrioides/metabolismo , Camundongos , Modelos Biológicos , Células-Tronco Pluripotentes/metabolismo , Ligação Proteica , Estabilidade de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcriptoma/genética
12.
Nat Commun ; 11(1): 1312, 2020 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-32161263

RESUMO

The emergence of small open reading frame (sORF)-encoded peptides (SEPs) is rapidly expanding the known proteome at the lower end of the size distribution. Here, we show that the mitochondrial proteome, particularly the respiratory chain, is enriched for small proteins. Using a prediction and validation pipeline for SEPs, we report the discovery of 16 endogenous nuclear encoded, mitochondrial-localized SEPs (mito-SEPs). Through functional prediction, proteomics, metabolomics and metabolic flux modeling, we demonstrate that BRAWNIN, a 71 a.a. peptide encoded by C12orf73, is essential for respiratory chain complex III (CIII) assembly. In human cells, BRAWNIN is induced by the energy-sensing AMPK pathway, and its depletion impairs mitochondrial ATP production. In zebrafish, Brawnin deletion causes complete CIII loss, resulting in severe growth retardation, lactic acidosis and early death. Our findings demonstrate that BRAWNIN is essential for vertebrate oxidative phosphorylation. We propose that mito-SEPs are an untapped resource for essential regulators of oxidative metabolism.


Assuntos
Complexo III da Cadeia de Transporte de Elétrons/metabolismo , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Fosforilação Oxidativa , Peptídeos/metabolismo , Proteínas de Peixe-Zebra/metabolismo , Acidose Láctica/genética , Animais , Animais Geneticamente Modificados , Modelos Animais de Doenças , Feminino , Técnicas de Silenciamento de Genes , Transtornos do Crescimento/genética , Humanos , Masculino , Metabolômica , Proteínas Mitocondriais/genética , Modelos Animais , Modelos Biológicos , Fases de Leitura Aberta/genética , Peptídeos/genética , Proteômica , Peixe-Zebra/genética , Peixe-Zebra/crescimento & desenvolvimento , Proteínas de Peixe-Zebra/genética
13.
Exp Cell Res ; 391(1): 111923, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32135166

RESUMO

Growing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields. Continuous development of techniques and tools led to an improved sORF discovery, where these can originate from bioinformatics analyses, from sequencing routines or proteomics approaches. In this mini review, we discuss the ongoing trends in the three fields and suggest some strategies for further characterization of high potential candidates.


Assuntos
Biologia Computacional/estatística & dados numéricos , Redes Neurais de Computação , Fases de Leitura Aberta , Biossíntese de Proteínas , Proteoma/genética , Ribossomos/genética , Animais , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Plantas/genética , Sinais Direcionadores de Proteínas/genética , Proteoma/classificação , Proteoma/metabolismo , Ribossomos/classificação , Ribossomos/metabolismo , Software
14.
Neurosci Res ; 151: 31-37, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30862443

RESUMO

Brain derived peptides function as signaling molecules in the brain and regulate various physiological and behavioral processes. The low abundance and atypical fragmentation of these brain derived peptides makes detection using traditional proteomic methods challenging. In this study, we introduce and validate a new methodology for the discovery of novel peptides derived from mammalian brain. This methodology combines ribosome profiling and mass spectrometry-based peptidomics. Using this framework, we have identified a novel peptide in mouse whole brain whose expression is highest in the basal ganglia, hypothalamus and amygdala. Although its functional role is unknown, it has been previously detected in peripheral tissue as a component of the mRNA decapping complex. Continued discovery and studies of novel regulating peptides in mammalian brain may also provide insight into brain disorders.


Assuntos
Neuropeptídeos/isolamento & purificação , Proteômica/métodos , Animais , Encéfalo/metabolismo , Masculino , Espectrometria de Massas , Camundongos , Camundongos Endogâmicos C57BL , Neuropeptídeos/análise , Peptídeos , Ribossomos , Análise de Sequência de Proteína
15.
PLoS One ; 14(9): e0215185, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31545805

RESUMO

Neuropeptides are a class of bioactive peptides shown to be involved in various physiological processes, including metabolism, development, and reproduction. Although neuropeptide candidates have been predicted from genomic and transcriptomic data, comprehensive characterization of neuropeptide repertoires remains a challenge owing to their small size and variable sequences. De novo prediction of neuropeptides from genome or transcriptome data is difficult and usually only efficient for those peptides that have identified orthologs in other animal species. Recent peptidomics technology has enabled systematic structural identification of neuropeptides by using the combination of liquid chromatography and tandem mass spectrometry. However, reliable identification of naturally occurring peptides using a conventional tandem mass spectrometry approach, scanning spectra against a protein database, remains difficult because a large search space must be scanned due to the absence of a cleavage enzyme specification. We developed a pipeline consisting of in silico prediction of candidate neuropeptides followed by peptide-spectrum matching. This approach enables highly sensitive and reliable neuropeptide identification, as the search space for peptide-spectrum matching is highly reduced. Nematostella vectensis is a basal eumetazoan with one of the most ancient nervous systems. We scanned the Nematostella protein database for sequences displaying structural hallmarks typical of eumetazoan neuropeptide precursors, including amino- and carboxyterminal motifs and associated modifications. Peptide-spectrum matching was performed against a dataset of peptides that are cleaved in silico from these putative peptide precursors. The dozens of newly identified neuropeptides display structural similarities to bilaterian neuropeptides including tachykinin, myoinhibitory peptide, and neuromedin-U/pyrokinin, suggesting these neuropeptides occurred in the eumetazoan ancestor of all animal species.


Assuntos
Evolução Molecular , Neuropeptídeos/genética , Anêmonas-do-Mar/química , Anêmonas-do-Mar/genética , Espectrometria de Massas em Tandem , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Sequência Conservada , Bases de Dados Genéticas , Expressão Gênica , Neuropeptídeos/química , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
16.
Genes (Basel) ; 10(9)2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31492022

RESUMO

The increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing. Finally, we give an overview of the proposed solutions and the outlook for future studies.


Assuntos
Privacidade Genética/normas , Medicina de Precisão/ética , Proteômica/ética , Humanos , Consentimento Livre e Esclarecido/normas , Medicina de Precisão/métodos , Proteômica/métodos
17.
Artigo em Inglês | MEDLINE | ID: mdl-31238262

RESUMO

On average a human cell type expresses around 10,000 different protein coding genes synthesizing all the different molecular forms of the protein product (proteoforms) found in a cell. In a typical shotgun bottom up proteomic approach, the proteins are enzymatically cleaved, producing several 100,000 s of different peptides that are analyzed with liquid chromatography-tandem mass spectrometry (LC-MSMS). One of the major consequences of this high sample complexity is that coelution of peptides cannot be avoided. Moreover, low abundant peptides are difficult to identify as they have a lower chance of being selected for fragmentation due to ion-suppression effects and the semi-stochastic nature of the precursor selection in data-dependent shotgun proteomic analysis where peptides are selected for fragmentation analysis one-by-one as they elute from the column. In the current study we explore a simple novel approach that has the potential to counter some of the effect of coelution of peptides and improves the number of peptide identifications in a bottom-up proteomic analysis. In this method, peptides from a HeLa cell digest were eluted from the reverse phase column using three different elution solvents (acetonitrile, methanol and acetone) in three replicate reversed phase LC-MS/MS shotgun proteomic analysis. Results were compared with three technical replicates using the same solvent, which is common practice in proteomic analysis. In total, we see an increase of up to 10% in unique protein and up to 30% in unique peptide identifications from the combined analysis using different elution solvents when compared to the combined identifications from the three replicates of the same solvent. In addition, the overlap of unique peptide identifications common in all three LC-MS analyses in our approach is only 23% compared to 50% in the replicates using the same solvent. The method presented here thus provides an easy to implement method to significantly reduce the effects of coelution and ion suppression of peptides and improve protein coverage in shotgun proteomics. Data are available via ProteomeXchange with identifier PXD011908.


Assuntos
Cromatografia Líquida/métodos , Proteoma/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Células HeLa , Humanos , Peptídeos/química
18.
Mol Cell Proteomics ; 18(8 suppl 1): S126-S140, 2019 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-31040227

RESUMO

PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.


Assuntos
Proteogenômica/métodos , Ribossomos/metabolismo , Cromatografia Líquida , Células HCT116 , Humanos , Células Jurkat , Espectrometria de Massas em Tandem
19.
J Proteome Res ; 18(6): 2686-2692, 2019 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-31081335

RESUMO

Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .


Assuntos
Proteômica/normas , Humanos , Armazenamento e Recuperação da Informação , Espectrometria de Massas , Software
20.
Nucleic Acids Res ; 47(6): e36, 2019 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-30753697

RESUMO

Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.


Assuntos
Algoritmos , Anotação de Sequência Molecular/métodos , Células Procarióticas/metabolismo , Biossíntese de Proteínas/fisiologia , Ribossomos/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Ensaios de Triagem em Larga Escala/métodos , Redes Neurais de Computação , Fases de Leitura Aberta , Células Procarióticas/química , Processamento de Proteína Pós-Traducional , Alinhamento de Sequência/métodos , Transdução de Sinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA